University of Alberta NEW REPRESENTATIONS AND APPROXIMATIONS FOR SEQUENTIAL DECISION MAKING UNDER UNCERTAINTY
نویسندگان
چکیده
This dissertation research addresses the challenge of scaling up algorithms for sequential decision making under uncertainty. In my dissertation, I developed new approximation strategies for planning and learning in the presence of uncertainty while maintaining useful theoretical properties that allow larger problems to be tackled than is practical with exact methods. In particular, my research tackles three outstanding issues in sequential decision making in uncertain environments: performing stable generalization during off-policy updates, balancing exploration with exploitation, and handling partial observability of the environment. The first key contribution of my thesis is the development of novel dual representations and algorithms for planning and learning in stochastic environments. This dual view I have developed offers a coherent and comprehensive approach to optimal sequential decision making problems, provides an alternative to standard value function based techniques, and opens new avenues for solving sequential decision making problems. In particular, I have shown that dual dynamic programming algorithms can avoid the divergence problems associated with the standard primal approach, even in the presence of approximation and off-policy updates. Another key contribution of my thesis is the development of a practical action selection strategy that addresses the well known exploration versus exploitation tradeoff in reinforcement learning. The idea is to exploit information in a Bayesian posterior to make intelligent actions by growing an adaptive, sparse lookahead tree. This technique evaluates actions while taking into account any effects they might have on future knowledge, as well as future reward, and outperforms current selection strategies. Finally my thesis also develops a new approach to approximate planning in partially observable Markov decision processes. Here the challenge is to overcome the exponential space required by standard value iteration. For this problem, I introduced a new, quadratic upper bound approximation that can be optimized by semidefinite programming. This approach achieves competitive approximation quality while maintaining a compact representation; requiring computational time and space that is only linear in the number of decisions. Overall, my dissertation research developed new tools for computing optimal sequential decision strategies in stochastic environments, and has contributed significant progress on three key challenges in reinforcement learning.
منابع مشابه
Learning Efficient Representations for Reinforcement Learning
Markov decision processes (MDPs) are a well studied framework for solving sequential decision making problems under uncertainty. Exact methods for solving MDPs based on dynamic programming such as policy iteration and value iteration are effective on small problems. In problems with a large discrete state space or with continuous state spaces, a compact representation is essential for providing...
متن کاملClassification and properties of acyclic discrete phase-type distributions based on geometric and shifted geometric distributions
Acyclic phase-type distributions form a versatile model, serving as approximations to many probability distributions in various circumstances. They exhibit special properties and characteristics that usually make their applications attractive. Compared to acyclic continuous phase-type (ACPH) distributions, acyclic discrete phase-type (ADPH) distributions and their subclasses (ADPH family) have ...
متن کاملEfficient Linear Approximations to Stochastic Vehicular Collision-Avoidance Problems
The key components of an intelligent vehicular collision-avoidance system are sensing, evaluation, and decision making. We focus on the latter task of finding (approximately) optimal collision-avoidance control policies, a problem naturally modeled as a Markov decision process. However, standard MDP models scale exponentially with the number of state features, rendering them inept for large-sca...
متن کاملThe Role and Impact of Mental Simulation in Design
Although theories of mental simulations have used different formulations of the premises of ‘thought experiments’, they can be fitted under a minimalist hypothesis stating that mental simulations are run under situations of uncertainty to turn that uncertainty into approximate answers. Three basic assumptions of mental simulations were tested by using naturalistic data from engineering design. ...
متن کاملConditional probability generation methods for high reliability effects-based decision making
Decision making is often based on Bayesian networks. The building blocks for Bayesian networks are its conditional probability tables (CPTs). These tables are obtained by parameter estimation methods, or they are elicited from subject matter experts (SME). Some of these knowledge representations are insufficient approximations. Using knowledge fusion of cause and effect observations lead to bet...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007